── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ plotly::filter() masks dplyr::filter(), stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
vocational_unemp <- unemp_df[unemp_df$education_level =="some_college_assoc",]###### Convert to Time Series Object ######ts_unemp <-ts(vocational_unemp$unemployment_rate, start =c(2019, 1), frequency =12)
Analyze Lag Plots
Code
ts_lags(ts_unemp)
Overall, the lag plots do not show strong linear relationships across the higher order lags. Lag 1 shows the strongest upward pattern, however, it is not clearly linear. This suggests that this month’s unemployment rate for vocational grads is moderately correlated with last month’s unemployment rate. Lags 2 and 3 also show positive sloping patterns. However, after Lag 4, the plots begin to appear random.
This pattern indicates mild autocorrelation at short lags. After lags 3-4, there is no long-term autocorrelation or seasonality.
ACF and PACFs
Code
library(forecast)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Attaching package: 'forecast'
The following object is masked from 'package:astsa':
gas
Code
ggAcf(ts_unemp, frequency=12)
Warning in ggplot2::geom_segment(lineend = "butt", ...): Ignoring unknown
parameters: `frequency`
Code
ggPacf(ts_unemp, frequency=12)
Warning in ggplot2::geom_segment(lineend = "butt", ...): Ignoring unknown
parameters: `frequency`
The ACF plot shows strong auto-correlation at initial lags, indicating that last month’s unemployment rate is very predictive of this month’s unemployment rate.
The ACF plot also shows that vocational graduates’ unemployment rates are non-stationary, with values decaying slowly over time, rather than rapidly. This suggests that observations far in the past influence the current value. The PACF also indicates that current values depend heavily on the previous values, with a very large spike at lag 1.
Code
tseries::adf.test(ts_unemp)
Augmented Dickey-Fuller Test
data: ts_unemp
Dickey-Fuller = -2.7152, Lag order = 4, p-value = 0.2829
alternative hypothesis: stationary
The P-value obtained from the ADF test is greater than 0.05. Therefore, we do not have enough evidence to reject the null hypothesis at 5% significance level. This indicates the data is non-stationary, which aligns with the ADF and PACF plots we obtained.
Warning: Removed 6 rows containing missing values or values outside the scale range
(`geom_line()`).
Seasonal: The seasonal component represents regular, repeating patterns throughout the year. While the plot appears to show a consistent yearly rise, followed by a dip, this pattern doesn’t seem to represent true seasonality. The lag plots and ACF and PACF plots we generated earlier do not show seasonality. The seasonal pattern shown in the classical decomposition likely is visualizing the assumption of a yearly seasonal structure rather than true seasonality in the unemployment rate.
Trend: The trend line represents the underlying pattern of unemployment over a longer period, excluding short-term fluctuations. The trend shows a rapid increase from 2019 to 2020, followed by the start of a steady decline in late 2020 that has continued through 2025. This very clearly shows the spike in unemployment as a result of COVID-19, and the economy’s slow recovery over time.
Remainder: The remainder component captures the random, unpredictable variations in the data not explained by the trend or seasonality. The remainder clearly shows the steep spike in vocational graduates’ unemployment rate at the start of 2020 due to the COVID-19 pandemic. After 2020, the remainder values decreases and stabilizes, showing that the shocks caused by COVID-19 eventually diminished once the economy began to transition into recovery.
Decomposition
Code
diff_1 <-diff(ts_unemp)diff_2 <-diff(ts_unemp, differences =2)acf_plot_1 <-ggAcf(diff_1,50) +ggtitle("ACF of First-Order Differenced Series") +theme_minimal()acf_plot_2 <-ggAcf(diff_2,50) +ggtitle("ACF of Second-Order Differenced Series") +theme_minimal()acf_plot_1/acf_plot_2
The ACF and PACF plots of the first and second differenced series both suggest that the data only needs one round of differencing. In the first-order differenced ACF, the data appears near-stationary, with most auto-correlations within the confidence bands and fluctuating around a constant mean. The second-order differenced ACF has a very large negative lag-1 auto-correlation, which is a sign that the data is over-differenced.
Code
p1<-ggPacf(diff_1,50) +ggtitle("PACF of First Differenced Series") +theme_minimal()p2<-ggPacf(diff_2,50) +ggtitle("PACF of Second Differenced Series") +theme_minimal()p1/p2
The first order differenced PACF tells a similar story, with a most partial correlations falling within the confidence bounds and no signs of trend or seasonality. The second order differenced PACF again shows a very large negative value for lag 1, and a pattern of negative partial correlations. This likely means the second-order difference has introduced more complexity than necessary for this data.
Code
library(knitr)library(kableExtra)
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
Code
p_range <-0:2d_range <-1q_range <-0:2n_combinations <-length(p_range) *length(d_range) *length(q_range)results_matrix <-matrix(NA, nrow = n_combinations, ncol =6)i <-1for (q in q_range) {for (p in p_range) { d <- d_range model <-Arima(ts_unemp, order =c(p, d, q), include.drift =TRUE) results_matrix[i, ] <-c(p, d, q, model$aic, model$bic, model$aicc) i <- i +1 }}results_df <-as.data.frame(results_matrix)colnames(results_df) <-c("p", "d", "q", "AIC", "BIC", "AICc")highlight_aic_row <-which.min(results_df$AIC)highlight_bic_row <-which.min(results_df$BIC)knitr::kable(results_df, align ='c', caption ="Comparison of ARIMA Models") %>%kable_styling(full_width =FALSE, position ="center") %>%row_spec(c(highlight_aic_row,highlight_bic_row), bold =TRUE, background ="#FFFF99") %>%row_spec(highlight_bic_row, bold =TRUE, background ="#90EE90")